/o>-  a/3  y-Czr 


VLSI  RESEARCH 


SEMI-ANNUAL 

Technical  R&D  STATUS  REPORT 
APRIL  -  OCTOBER  1983 


PRINCIPAL  INVESTIGATOR 


R.W.  BRODERSEN 
(U15)6U2-1779 


FACULTY  RESEARCHERS 

R.W.  Brodersen 
N.  CHEUNG 
A.  DESPAIN 
D.A.  HODGES 

C. HU 

D.  MESSERSCHMITT 
R . S •  MULLER 

A.  NEUREUTHER 

A.P.  NEWTON 

W.G.  OLDHAM 

J.  OUSTERHOUT 

D.A.  PATTERSON 

A.  SANGIOVANNI-VINCENTEDLI 

C.  SEOUIN 


SPONSORED  BY 

DEFENSE  ADVANCED  RESEARCH  PROJECTS  AGENCY 
ARPA  ORDER  NO.  I4O3I 

MONITORED  BY  NAVAL  ELECTRONIC  SYSTEMS  COMMAND 
UNDER  CONTRACT  NO.  N00039-81-K-0251 

The  views  and  conclusions  contained  in  this 
document  are  those  of  the  authors  and  should 
not/be  interpreted  as  representing  the  official 
policies,  either  expressed  or  implied,  of  the 
Defense  Advanced  Research  Projects  Agency  or 
the  U.S.  Government. 


ELECTRONICS  RESEARCH  LABORATORY 
COLLEGE  OF  ENGINEERING 

UNIVERSITY  OF  CALIFORNIA,  BERKELEY,  CA  9U720 


VLSI  RESEARCH 


SEMI-ANNUAL 

TECHNICAL  R&D  STATUS  REPORT 
APRIL  -  OCTOBER  1983 


PRINCIPAL  INVESTIGATOR 


R.W.  BRODERSEN 
(415)642-1779 


FACULTY  RESEARCHERS 

R.W.  BRODERSEN 
N.  CHEUNG 
A.  DESPAIN 
D.A.  HODGES 

C.  HU 

D.  MESSERSCHMITT 
R.S.  MULLER 

A.  NEUREUTHER 

A.P.  NEWTON 

W.G.  OLDHAM 

J.  OUSTERHOUT 

D.A.  PATTERSON 

A.  SANGIOVANNI-VINCENTELLI 

C.  SEQUIN 


TABLE  OF  CONTENTS 


1.  SECOND  GENERATION 

2.  ARCHITECTURE  SOFTWARE  PROTOTYPING 

3.  MULTIPROCESSOR  CIRCUIT  SIMULATION 

4.  NOVEL, HIGH-PERFORMANCE  ARCHITECTURES 

5.  SPEECH  RECOGNITION 

6.  MACROCELLS  FOR  SIGNAL  PROCESSING 

7.  RESEARCH  IN  CIRCUIT  SIMULATION  AND  LAYOUT 

8.  NEW  VLSI  TOOLS  DISTRIBUTION 

9.  BERKELEY  ADVANCED  CMOS 

10 ,  TECHNOLOGY 


! 


1.  SECOND  GENERATION  RISCs  (Patterson,  Sequin) 

We  have  submitted  the  RISC  II  processor  and  instruction  cache  for  fabrica¬ 
tion. 

The  43,000  transistor  RISC  II  chip  uses  a  compact  dual  bus  register  cell, 
providing  60%  more  registers  in  a  25%  smaller  chip  using  the  same  design  rules. 
However,  the  sharing  of  the  bit  lines  for  reading  and  writing  required  a  extra 
pipe  stage  plus  operand  forwarding  circuits.  Since  the  chip  was  thoroughly 
checked  with  the  Crystal  timing  analyzer,  we  expect  this  CPU  to  come  close  to 
our  original  performance  goals  [2]  [4]. 

The  46,000  transistor  RISC  II  instruction  cache  uses  several  new  architec¬ 
tural  ideas.  For  example,  the  addition  of  one  bit  per  word  enables  us  to  turn  off 
bad  words  thus  doubling  the  yield  of  this  chip,  and  an  instruction  address  pred¬ 
ictor  effectively  doubles  the  speed  of  the  cache  memory  [!>].  Our  long  term  goal 
is  to  combine  the  cache  on  the  same  chip  with  the  processor. 

Through  talks,  panels,  debates,  and  papers  we  have  continued  to  explain  the 
RISC  concept  [ll  [3]  [8],  and  the  methodologies  used  for  its  design  and  imple¬ 
mentation  [6]  [9J. 


[1]  D.A.  Patterson:  ‘'Microprogramming,”  Scientific  American,  Vol.  248,  No.  3. 
March,  1983,  pp.  36-43. 

[2]  J.K  Foderaro,  K.S.  Van  Dyke,  and  D.A.  Patterson:  "Running  RISCs,”  VLSI 
Design,  Vol.  Ill,  No.  5,  pp.  27-32,  September/October,  1982. 

[3]  D.A.  Patterson  and  R.S.  Piepho:  "Assessing  RISCs  in  High-Level  Language 
Support,"  IEEE  Micro,  November.  1982. 

[4]  D.A.  Patterson  and  C.H.  SAquin:  "A  VLSI  RISC,”  Computer  Vol.  15,  No.  9,  pp. 
8-21,  September  1982. 

[5]  D.A.  Patterson,  P.  Garrison.  M.  Hill,  D.  Lioupis,  C.  Nyberg,  7.  Sippel,  and  K. 
Van  Dyke:  “Architecture  of  a  VLSI  Instruction  Cache  for  a  RISC,"  Tenth 
Annual  Symposium  in  Computer  Architecture,  June  14-16,  19B3. 

[6]  C.H.  Sequin  and  D.A.  Patterson:  "Design  and  Implementation  of  RISC  I", 
Computer  Science  Report  No.  UCB/CSD  82/106,  Oct.  1962,  23  pages. 

[7]  R.M.  Fujimoto  and  C.H.  Sequin:  "The  Impact  of  VLSI  on  Communications  in 
Multiprocessor  Networks".  Proc.  COIJPSAC  82,  pp.  231-238  Chicago,  Nov. 
10-12,  1982. 

[8]  Y.  Tamir  and  C.H.  Sequin:  "Strategies  for  Managing  the  Register  File  in 
RISC",  accepted  for  publication  in  IEEE  Trans,  on  Computers,  (  ~  Oct. 
1983),  35  pages. 

[9]  C.H.  Sequin:  "Managing  VLSI  Complexity:  an  Outlook",  IEEE  Proceedings  Vol 
71.  No  1.  pp  149  -  166,  Jan.  1983. 


2.  ARCHITECTURE  FOR  SOFTWARE  PROTOTYPING  (Patterson) 

Our  first  step  towards  building  a  machine  to  run  exploratory  programming 
environments  is  to  make  a  version  of  Smailtalk-80  that  runs  on  the  VAX  under 
Berkeley  UNIX.  We  have  completed  that  first  step,  called  Berkeley  Smalltalk,  [l] 
This  system  will  serve  as  the  ‘guinea  pig*  for  our  architecture  experiments  and 
will  also  be  the  software  kernel  for  our  new  machine.  In  addition,  although  there 
are  three  other  implementations  of  Smalltalk-90  using  various  operating  sys¬ 
tems  and  languages  on  the  VAX,  Berkeley  Smalltalk  is  the  fastest.  Our  initial 
architecture  investigations  have  found  that  several  of  the  ideas  from  RISC  work 
well  with  Smalltalk.  For  example,  the  register  window  studies  for  RISC  I  found 
that  the  best  buffer  size  was  8  windows.  We  have  completed  a  similar  study  for 
Smalltalk  and  found  that  the  knee  in  the  curve  is  also  at  8  windows.  Perhaps  the 
nicest  discovery  is  that  the  reduced  instruction  set  seems  compatible  with 
Smalltalk.  Even  though  an  operator  such  as  '+’  can  mean  a  wide  variety  of 
operations  (e.g.,  floating  point  add,  logical  or,  set  union  in  addition  to  integer 
addition)  that  cannot  be  determined  until  runtime,  we  found  that  on  93  %  of  the 
time  both  operands  are  simple  integers.  Similarly,  the  full  blown  semantics  of 
Smalltalk  requires  a  table  lookup  for  each  procedure  call  to  see  which  pro¬ 
cedure  is  to  be  activated,  but  95  %  of  the  cases  go  to  the  same  procedure  as  last 
time.  Thus  we  plan  to  short  circuit  the  table  lookup  to  accelerate  execution. 
Our  current  architecture  viewpoint  is  to  use  the  same  RISC  philosophy  to  design 
and  build  a  machine  to  that  runs  fast  for  the  common  cases  yet  is  easy  to  imple¬ 
ment.  It  will  be  interesting  to  see  if  this  approach  will  once  again  yield  a 
machine  with  surprisingly  high  performance. 


[l]  D.M.  Ungar  and  D.A.  Patterson:  “Berkeley  Smalltalk:  Who  Knows  Where  the 
Time  Goes,"  SmaUtaJJe-80:  Bits  of  History,  Words  of  Advice.  Addison- 
Wessley.  Summer  '83. 


a  MULTIPROCESSOR  CIRCUIT  SIMULATION  (Messerschmitt) 

Code  generators  for  LU  decomposition  in  a  SPICE  circuit  simulator  running 
on  a  multiprocessor  architecture  have  been  completed.  These  have  been  run  on 
a  multiprocessor  simulator,  and  their  performance  evaluated.  Improved 
scheduling  algorithms  which  take  into  account  interprocessor  communications 
latency  have  been  developed  and  evaluated,  and  yield  a  30  to  50  percent 
speedup.  Further  improvements  are  being  pursued. 

Processor  interconnection,  topology  and  routing  algorithms  are  being  pur¬ 
sued.  Automatic  generation  of  topology  from  traffic  statistics  has  been  imple¬ 
mented  using  a  clustering  algorithm.  Measures  of  interconnection  hardware 
complexity  and  speed  performance  are  being  developed  in  order  to  evaluate 
alternate  interconnection  topologies. 


4.  NOVEL,  HIGH-PERFORMANCE  ARCHITECTURES  (Despain,  Pact  and  Baden) 

We  have  determined  that  functional  programming  languages  provide  a 
natural  framework  for  programming  highly  parallel  computers  [l].  Our  goal  was 
to  adopt  a  programming  style  that  allows  the  programmer  to  express  parallel¬ 
ism  without  being  overburdened  with  low-level  hardware  dependencies. 

We  developed  a  compiler  for  Backus’  functional  Lanuage,  FP  [2]  called 
Berkeley  FP;  it  has  been  released  on  4.2  BSD  UNIX'*.  A  recent  study  of  the 
Berkeley  FP  code  generator  has  shown  that  FP’s  underlying  algebra  can  be  used 
to  make  dramatic  improvements  in  the  quality  of  the  code  generated. 

[1]  S.B.  Baden,  D.R.  Patel,  "Berkeley  FP  -  Experiences  with  a  Function.il  Pro¬ 
gramming  Language",  COMPCON  '83,  February  28  -  March  3  1983,  Sar.  Fran¬ 
cisco,  California. 

[2]  S.B.  Baden,  "Berkeley  FP  User's  Manual,  Rev.  4.1",  December  15,  1982, 
available  on  4.2  BSD. 


6.  SPEECH  RECOGNITION  (Brodersen) 


The  goal  of  this  project  is  to  design  an  MOS-LSI  speech  recognition  system  that  is 
capable  of  accurately  recognizing  a  large  vocabulary  of  isolated  words  in  real 
time.  This  system  is  based  on  two  custom-designed  integrated  circuits, 
memory,  and  a  control  microprocessor  [l]. 

The  algorithm  used  is  similar  to  that  used  in  other  dynamic- time-warped, 
template  based  automatic  speech  recognition  (ASR)  systems.  It  maintains  a  dic¬ 
tionary  of  model  words  or  templates  to  which  all  input  words  are  compared.  The 
template  that  is  most  similar  to  the  word  just  spoken  is  recognized  and  some 
associated  action  is  performed.  If  no  words  are  similar  enough  to  the  input,  no 
action  is  performed.  This  dictionary  can  either  be  filled  by  the  user  in  a  training 
phase  prior  to  usage  (speaker-dependent  mode)  or  it  can  be  used  in  a  speaker- 
independent  mode  by  having  multiple  templates  of  each  word  which  span  the 
various  types  of  speakers. 

The  first  of  the  custom  chips  circuit  analyzes  the  input  speech  by  the  use  of 
a  filter  bank,  decides  when  a  word  was  spoken,  and  passes  on  an  internal 
representation  of  the  word  to  a  word  comparator.  The  word  comparator  is  the 
second  custom  chip  and  can  process  up  to  1000  vocabulary  words  (500  seconds 
of  speech)  in  real  time.  The  microprocessor  collects  the  outputs  of  the  word 
comparators  ard  uses  them  along  with  other  syntactic  or  semantic  information 
to  make  its  rec  agnition  decision.  The  microprocessor  then  passes  on  the  recog¬ 
nized  word  to  i.he  host  system  or  performs  some  further  application  system 
task. 

The  system,  can  also  be  configured  to  recognize  connected  speech.  In  this 
mode  it  uses  tt  e  isolated-word  recognizer  to  spot  words  inside  the  phrase  and 
then  it  ties  the  se  words  together  using  an  algorithm  performed  by  the  control 
microprocessor. 

A  TTL  version  of  the  recognition  system  has  been  working  for  some  time 
which  is  based  on  3  circuit  boards.  Recently  the  chip  to  perform  the  word  com- 

fiarison  has  been  completed  which  has  basically  replaced  one  of  these  boards 
2].  A  second  version  of  this  chip  is  now  in  design  that  will  include  some  of  the 
glue  logic  that  is  now  being  done  in  TTL 

The  chip  to  perform  the  front  end  functions  of  spectral  analysis  and  end 
point  detection  (for  isolated  speech)  is  now  in  layout  using  a  macrocell  approach 
to  the  design. 


[l]  H.  Murviet.  M.  Lowy  and  R.W.  Brodersen,  "A  1000  Word  Recognition  System 
Using  MOS-LSI"  Proc.  of  VLSI  in  Signal  Processing  Nov.  1982,  pp.  90-95. 


[2]  M.  Lowy,  H.  Murviet,  D.  Mintz  and  R.W.  Brodersen,  "An  Architecture  for  a 
Speech  Recognition  System",  ISSCC  Technical  Digest,  New  York,  Feb.  1983, 


&  MACROCELLS  TOR  SIGNAL  PROCESSING  (Brodersen) 

Design  of  a  23,000  transistor  NMOS  integrated  circuit  implementing  a  LPC 
vocoder  function  has  been  completed.  Testing  and  verification  of  the  device  is 
currently  under  way.  The  device  implements  a  a  lattice  analyzer,  pitch/voicing 
analyzer,  and  lattice  synthesizer.  The  tenth-order  adaptive  lattice  analysis  filter 
uses  an  algorithm  similar  to  that  described  by  Fellman  [l].  The  pitch/voicing 
analyzer  uses  Cold's  algorithm  [2].  The  synthesizer  consists  of  a  excitation  gen¬ 
erator  based  on  a  voiced/unvoiced  speech  model  and  a  lattice  synthesis  filter. 
Alternatively,  the  excitation  may  be  taken  from  an  external  source,  allowing  use 
in  baseband  encoding  schemes. 

The  device  interfaces  directly  to  A/D,  D/A  and  host  microcomputer  devices, 
allowing  a  low-parts-count  system  implementing  a  full-duplex  2400  bits/sec. 
speech  transmission  system  conforming  to  DARPA  and  D0D  speech  communica¬ 
tions  protocols. 

A  macrocell  library  for  signal  processing  circuits  has  been  developed  to 
facilitate  the  design  of  a  number  of  signal  processing  LSI  circuits  including  the 
LPC  vocoder  described  above  [3].  This  library  supports  the  rapid  design  of 
semi-custom  circuits  with  applications  in  speech  processing,  image  processing 
and  data  communications.  Processors,  control  sequencers  and  other  large  func¬ 
tional  units  may  be  configured  from  this  library  into  multiprocessor  circuits 
with  high  functional  throughput. 

Three  circuit  designs  are  in  progress  in  addition  to  the  LPC  vocoder:  a  16- 
channel  filter-bank  circuit;  a  word  endpoint  detector  circuit;  and  a  variable-bit- 
rate  formant  speech  synthesizer. 


[1]  Fellman.  R.D.,  "An  MOS-LSI  Adaptive  Linear  Prediction  Filter  for  Speech  Pro¬ 
cessing",  UCB/ERL  Memo.  M82/82,  University  of  California,  Berkeley  Ca.. 
Nov.  1962. 

[2]  B.  Gold,  L.  Rabiner,  "Parallel  Processing  Techniques  for  estimating  Pitch 
Periods  of  Speech  in  the  Time  Domain",  J.  Acoust.  Soc.  of  Amer. ,  V.  46,  No.  2 
(part  2),  1969,  pp.  442-448. 

[3]  S.P.  Pope.  R.W.  Brodersen,  "Macrocell  design  for  Concurrent  Signal  Process¬ 
ing",  Prac.  3rd  Annual  Conf.  on  VLSI,  California  Inst,  of  Tech,  Pasadena 
Ca.,  March  1963. 


7.  RESEARCH  IN  CIRCUIT  SIMULATION  AND  LAYOUT  (R.  Newton  and  A. 
Sangiovanni-Vincentelli) 

We  focused  in  this  funding  period  on  three  topics:  accurate  electrical  simu¬ 
lation  with  relaxation-based  algorithms,  PLA  optimal  design  and  channel  routing. 
Channel  routing  is  a  new  area  for  us  and  we  had  some  interesting  preliminary 
results  reported  in  [8]. 

7.1.  Circuit  simulation 

We  concentrated  on  extensions  and  improvements  of  algorithms  in  the  class 
known  as  Waveform  Relaxation.  In  these  methods,  large  circuits  are  decomposed 
into  many  loosely  coupled  small  circuits  (subcircuits),  and  then  the  transient 
response  waveform  for  each  subcircuit  is  calculated  by  "guessing"  the  behavior 
of  the  surrounding  subcircuits.  The  calculated  responses  for  each  subcircuit 
are  used  to  improve  the  "guesses",  and  the  transient  response  waveforms  are 
recalculated.  The  procedure  is  iterated  until  the  waveforms  converge.  We  imple¬ 
mented  the  waveform  relaxation  algorithm  for  the  special  case  of  MOS  circuits. 
A  test  program.  RELAX,  calculated  accurate  transient  responses  for  large  sub- 
circuits  up  to  60  times  faster  than  SPICE. 

RELAX2[l],  a  new  program  for  the  accurate  simulation  of  MOS  circuits,  has 
been  written  in  C.  RELAX2  handles  a  broader  class  of  MOS  digital  circuits  than 
the  RELAX  program,  and  includes  several  new  techniques  to  reduce  computation 
time.  In  RELAX  the  user  had  to  describe  his  circuit  with  predefined  MOS  subcir¬ 
cuits,  such  as  NAND  gates.  NOR  gates,  or  Flip-Flops,  but  RELAX2  allows  the  user 
to  define  and  use  his  own  subcircuits.  Also,  the  RELAX  program  converged  slowly 
when  used  to  simulate  circuits  with  logical  feedback  (e.g.  finite-state  machines). 
The  RELAX2  program  solves  this  problem  by  allowing  the  user  to  break  the  simu¬ 
lation  up  into  several  "windows"  of  time.  This  technique  effectively  breaks  the 
logical  feedback  loop,  and  increases  the  speed  of  convergence  of  the  waveform 
relaxation  method.  Another  speed-up  technique  involves  changing  the  accuracy 
of  the  calculation  of  the  subcircuit  waveforms  with  the  iteration,  so  that  for  the 
first  few  iterations  the  waveforms  are  calculated  quickly  and  approximately,  but 
by  the  final  iteration  they  are  calculated  accurately.  We  have  been  able  to  prove 
rigorously  that  the  convergence  properties  of  the  waveform  relaxation  methods 
did  not  change  if  these  speed  up  techniques  are  used  [2].  One  way  to  approxi¬ 
mate  the  calculation  of  the  suDcircuit  waveforms  is  to  use  a  simple  resistor  and 
switch  model  for  the  MOS  devices,  and  then  change  to  the  more  complex 
Shichman-Hoages  model  as  the  waveforms  approach  convergence.  Experimen¬ 
tal  results  from  this  technique  have  not  been  as  good  as  expected.  A  more  suc¬ 
cessful  approach  was  to  change  the  amount  of  error  allowed  by  the  numerical 
integration  algorithm  used  to  solve  for  the  subcircuit  waveforms.  Here,  unlike 
changing  the  device  models,  it  is  possible  to  increase  the  accuracy  of  the  calcu¬ 
lation  of  the  waveforms  at  each  iteration  by  screwing  down  this  error.  The 
results  from  this  approach  were  better,  providing  a  factor-  of-two  speedup  in 
many  cases.  In  addition,  the  RELAX2  program  provides  information  about  the 
topology  of  the  circuit,  which  will  aid  the  investigation  of  the  available  parallel¬ 
ism  in  the  waveform  relaxation  method. 

7.2.  Optimal  topological  design  of  PIAs 

We  continued  our  work  on  efficient  algorithms  for  optimal  topological 
design  of  PLAs  by  improving  the  algorithms  for  constrained  multiple  folding.  In 
particular,  we  defined  a  graph  theoretic  interpretation  of  the  multiple  PLA  fold¬ 
ing  probiem[3,S]  and  then  we  defined  a  constrained  optimization  problem  to 
achieve  minimal  silicon  area  occupation  with  constrained  positions  of  electrical 
inputs  and  outputs[4,5].  We  implemented  the  algorithms  in  a  new  version  of 


PLEASURE,  a  program  for  constrained/unconstrained  simple/multiple  fold- 
ing[4,5]. 


[1]  J.  White  and  A.  L.  Sangiovanni-Vincentelii,  "RELAX2:  A  Modified  Waveform 
Relaxation  Approach  to  the  Simulation  of  MOS  Digital  Circuits"  Proc.  1983 
International  Symposium  on  Circuits  and  Systems,  Newport  Beach,  Califor¬ 
nia.  May  1983 

[2]  E.  Lelarasmee  and  A.  Sangiovanni-Vincentelii,  "Some  New  Results  on 
Waveform  Relaxation  Algorithms  for  the  Simulation  of  Integrated  Circuits." 
Proc.  of  1982  IEEE  Int.  Large  Scale  System  Symposium,  Virginia  Beach. 
Oct.  1982.  pp.  371-376. 

[3]  G.  De  Micheli  and  A  Sangiovanni-Vincentelii,  "Multiple  Folding  of  Programm¬ 
able  Logic  Arrays"  Proc.  1983  Int.  Symp.  on  Circ.  and  Syst.,  Newport 
Beach.  May  1983. 

[4]  G.  De  Micheli  and  A  Sangiovanni-Vincentelii,  "PLEASURE:  A  Computer  Pro¬ 
gram  for  Simple /Multiple  Constrained/Unconstrained  Folding  of  Pro¬ 
grammable  Logic  Arrays",  Proc.  1983  Design  Automation  Conference,  Miami 
Beach,  June  1983. 

[5]  G.  De  Micheli  and  A  Sangiovanni-Vincentelii,  "  “PLEASURE:  A  Computer  Pro¬ 
gram  for  Simple/Multiple  Constrained/Unconstrained  Folding  of  Pro¬ 
grammable  Logic  Arrays",  ERL  Memo,  ERL  82/56,  December  1982. 


&  NEW  VLSI  TOOLS  DISTRIBUTION 

We  have  assembled  a  new  release  of  several  of  our  VLSI  Tools.  The  new 
release  includes  about  25  programs.  It  contains  updated  versions  of  Caesar  and 
Mextra  and  other  old  programs,  as  well  as  several  previousiy-unreieased  pro¬ 
grams,  such  as  Lyra,  Crystal,  Peg,  and  Tpack.  The  1983  release  was  sent  to  eight 
beta  test  sites  in  January,  and  began  general  distribution  on  April  1. 

8.1.  Tpack:  A  System  for  Combining  Graphics  and  Procedures  (Ousterhout, 
Newton) 

We  have  developed  a  style  of  module  generation  called  tile  packing.  In  this 
style,  a  module  generator  consists  of  two  parts:  a  collection  of  graphical  tiles 
that  define  building  blocKs,  and  a  program  that  arranges  the  building  blocks  into 
modules  such  as  PLAs,  ROMs,  and  decoders.  The  tile  packing  approach  has  the 
advantage  of  separating  the  technology  and  design  rule  information  (kept  in 
tiles)  from  the  arrangement  information  (kept  in  the  program  and  in  truth 
tables).  This  makes  it  easy  to  construct  simple  module  generators,  and  ensures 
that  more  complex  module  generators  are  not  obsoleted  by  changes  in  design 
rules.  We  have  built  a  new  FLA  generator,  Tpla,  using  the  tile  packing  approach, 
and  have  retargeted  it  for  several  different  VMOS  design  rules.  Work  is  under¬ 
way  to  retarget  the  same  program  to  CMOS,  and  to  build  splitting  and  folding 
PLA  generators. 

8.2.  Caddy  -  A  New  IC  layout  System 

We  have  undertaken  the  development  of  a  new  IC  layout  system  called 
Caddy.  The  system  has  three  overall  goals,  based  on  problems  experienced  with 
our  earlier  systems.  The  first  goal  is  to  integrate  design  rule  and  circuit  infor¬ 
mation  into  the  layout  editor  in  order  to  provide  incremental  design  rule  check¬ 
ing  and  circuit  extraction.  This  additional  expertise  will  permit  interactive  com¬ 
paction  and  stretching  of  layouts.  Hie  second  goal  is  to  move  away  from  fabrica¬ 
tion  details  by  eliminating  the  need  for  designers  to  specify  implants  and  wells 
and  contact  details  explicitly.  These  layers  will  be  generated  automatically  by 
the  system  (the  result  is  much  like  "sticks”  except  that  it  is  fleshed  out).  The 
third  goal  is  to  provide  interactive  semi-automatic  routing  aids.  In  this  respect, 
our  goal  is  not  to  invent  new  algorithms  arid  paradigms,  but  to  find  powerful 
ways  of  embedding  existing  techniques  into  an  interactive  design  environment. 
Initial  discussions  were  held  in  the  Spring  and  Fall  of  1982,  during  which  the 
underlying  data  structures  and  algorithms  ("corner  stitching")  were  developed. 
Between  January  and  April  of  1983  the  detailed  design  of  the  system  was  com¬ 
pleted  and  implementation  was  begun.  We  expect  to  use  a  bare-bones  system  in 
the  development  of  the  SOAR  chip  in  the  Spring  of  1983. 

[1]  J.K.  Ousterhout:  "Crystal:  A  Timing  Analyzer  for  aMOS  VLSI  Circuits,”  Third 

Caltech  Conference  on  VLSI,  Computer  Science  Press,  1983,  pp.  57-70. 

[2]  R.N.  Mayo  and  J.K.  Ousterhout:  "Pictures  -with  Parentheses:  Combining 

Graphics  and  Procedures  in  a  VLSI  Layout  Tool,"  20th  Design  Automation 

Conference ,  to  appear. 


Fabrication  of  the  first  round  of  devices  using  the  Berkeley  Advanced  CMOS 
Process  is  virtually  complete.  A  combined  mask  with  typical  active  devices  and 
Latch-Up  structures  is  being  used.  Delays  were  encountered  due  to  the  remo¬ 
deling  of  the  1C  Laboratory  and  problems  in  pattern  generation  of  the  masks. 
Only  the  contact  etching  and  metalization  remain  before  electrical  characteri¬ 
zation  can  be  started. 


9. 1.  Simulation  Aids  for  Viewing  Wafer  Topography  from  Layout 

A  top  down  design  of  the  topography  simulator  SEMPLE  has  been  defined 
and  dry  lab  tested  using  the  Berkeley  advanced  CMOS  The  dry  lab  testing  con¬ 
sisted  of  walking  through  49  processing  steps  following  the  evolution  of  the  two 
dimensional  cross  section  of  the  device  and  determing  the  processing  informa¬ 
tion  needed  at  each  step  for  defining  the  profile.  The  dry  lab  device  profile  has 
been  displayed  on  KIC  and  photographed  as  an  example  of  the  kind  of  display 
which  can  be  generated.  The  approach  for  implementing  SEMPLE  consists  of 
working  from  CIF  layout  files  to  produce  profile  description  files.  This  choice 
gives  maximum  portability  and  allows  SEMPLE  and  the  local  layout  system  to  be 
implemented  as  subsets  of  each  other.  The  programming  in  "C"  is  being  ini¬ 
tiated  and  will  proceed  from  Manhattan,  to  arcs,  and  finally  SAMPLE  and  SUPREM 
linked  features. 


10.  TECHNOLOGY  (Oldham,  Brodersen,  Hu  and  Cheung) 


10.1.  Soft  Error  Studies 

Computer  analysis  of  the  effects  of  surface  condition  on  the  is  surrounded 
by  reflecting  surface,  charge  collected  may  be  assumed  an  absorbing  surface 
model  which  assumed  an  absorbing  surface.  A  fuller  presentation  of  this  work 
has  been  submitted  for  publication.  We  have  also  made  preliminary  considera¬ 
tions  of  the  soft  error  problem  in  GaAs  IC’s.  There  are  theoretical  reasons  to 
expect  significant  soft  error  problems  in  them. 

10.2.  Hot  Carrier  Currents  in  Scaled  MOSFETs 

We  have  published  the  finding  that  photons  are  generated  in  the  high  field 
region  in  the  channel  and  these  photons  subsequently  create  minority  carriers 
in  the  substrate  [2].  We  have  carried  out  further  theoretical  and  experimental 
studies  and  now  believe  that  bremstrahlung  is  the  mechanism  of  photon  genera¬ 
tion.  We  have  also  photographed  the  light  emission  from  operating  MOSFETs  and 
used  this  technique  to  study  the  uniformity  of  the  electric  field  along  the  width 
of  MOSFETs.  This  work  will  be  reported  in  June  [3].  We  have  published  the  study 
of  the  breakdown  of  MOSFET  [4]  and  the  characteristics  of  MOSFET  near  and  in 
the  breakdown  regime  [5].  A  full  paper  on  the  later  subject  will  appear  soon  [6]. 
So  will  a  paper  on  the  punchthrough  of  MOSFETs  [?].  We  have  demonstrated  a 
universal  correlation  between  the  gate  and  the  substrate  and  minority  leakage 
currents.  These  results  were  reported  at  the  1983  ISSCC  [9].  Finally  these 
correlations  have  been  found  to  hold  in  0. 15  um  n-channel  MOSFETs. 

10.3.  Thin  Oxide  Studies 

Special  system,  software,  and  measurement  techniques  have  been 
developed  to  characterize  the  generation  and  filling  of  traps  in  Si02  and  at  the 
Si/Si02  interface.  Degradation  of  10  nm-gate  MOSFETs  due  to  Fowler-Nordheim 
tunneling  was  published  [ll].  This  work  has  been  extended  to  degradation  due 
to  substrate  hot  electrons  and  due  to  channel  hot  electron.  Several  papers  are 
being  prepared  to  report  the  findings.  Modeling  of  the  I-V  characteristics  of  very 
thin  oxides  will  be  published  in  April  in  the  Netherlands  [12]  and  in  May  in  San 
Francisco  [  13].  Recently,  we  have  completed  a  literature  review  on  the  subject 
of  time-dependent  breakdown  of  oxides.  We  have  postulated  and  are  examining 
a  physical  model  for  this  phenomenon. 

10.4.  Latchup  Studies 

Test  structures  have  been  developed  to  investigate  the  latchup  problem. 
We  have  measured  the  distribution  of  potential  at  or  near  the  surface  around  a 
point  source  of  current.  The  purpose  is  to  quantify  the  dependence  of  the 
triggering  current  and  latchup  susceptibility  on  layout,  scaling,  and  the  guard- 
ring  design.  The  results  are  supplemented  with  simulations  and  simple  theoreti¬ 
cal  analysis.  A  paper  is  being  prepared.  Experiments  are  underway  to  study  the 
impact  of  field  implant  and  the  use  of  epitaxial  substrates  and  a  novel  way  of 
replacing  the  epitaxial  substrate  with  a  blank,  high-energy  deep  implant. 
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10.5.  As  Implant  Anomalies 

Anomalous  negative  threshold  shifts  are  observed  following  annealing  of 
thin  oxides  implanted  with  arsenic.  Thermally  grown  oxide  (650  A)  is  implanted 
with  arsenic  at  25  Kev  (Rp=150  A,  A_Rp=50  A)  with  a  dose  of  10el3  cm"  .  After 
annealing  a  negative  threshold  shift  of  up  to  -5  volts  is  observed  for  n+  polysili¬ 
con  gate  structures  structures.  Other  experiments  suggest  penetration  of  oxide 
by  arsenic,  despite  predictions  of  LSS  scattering  and  arsenic  diffusion  [l]. 

10.6.  Low  Temperature  UOS  Electronics  -  C0DM0S 

We  present  a  new  depletion  mode  n-channel  device  structure  which  func¬ 
tions  over  the  temperature  range  77°  -  400°  K.  This  charged-oxide  depletion 
M0S  device  (C0DM0S)  uses  cesium  ion  implantation  of  silicon  dioxide  of  the  M0S 
structure.  Cesium  ions  have  shown  great  stability  in  SiOg  under  temperature 
bias  stressing.  The  presence  of  these  positive  ions  in  the  gate  oxide  turns  on  a 
channel  under  zero  gate  voltage.Overcoming  of  the  donor  freeze  out  problem 
observed  in  conventional  depletion  devices,  and  improved  subthreshold  and  sub¬ 
strate  sensitivity  behaviors  are  expected  using  this  structure  [2]. 


A  Standard  local  oxidation  n*channel  process  has  been  used  for  fabrication 
of  test  devices.  Low  energy  (40  kev,  Rp=19Q  A,  ARp=50  A  )  Cs+  ions  are 
implanted  after  growing  650  A  gate  oxide  in  1000  C  dry  0g  with  doses  of  3.46el2 
and  8.92el2  cm'  .Conventional  depletion  devices  are  also  fabricated  by  (  120 
kev ,1.488el2 and  2.8el2  cm')  arsenic  implantation  of  SiOg. 

It  is  observed  that  the  activation  of  cesium  implanted  ions  depends  on  the 
post  implantation  annealing  cycles.  Cs+  implanted  devices  have  been  fabricated 
with  a  room  temperature  threshold  voltage  of  -6.4  volts  which  shifts  to  -3.2  volts 
at  77°  K  This  is  compared  with  99  %  positive  shift  for  conventional  depletion 
devices.  It  is  suspected  that  freeze  out  of  the  interface  states  generated  by  Cs+ 
ion  implantation  is  the  cause  of  threshold  shifts  in  the  C0DM0S  devices.  Better 
substrate  sensitivity  of  C0DM0S  devices  is  observed  over  conventional  depletion 
devices. 

Successful  C0DU0S  device  test  results  confirm  the  feasibility  of  this  design 
technique  in  fabrication  of  high  performance  n-channe!  depletion  mode  devices 
especially  when  low  temperature  operation  is  desired. 
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10.7.  Channeling  Effect  of  Low  Energy  Boron  Implant  in  (100)  Silicon 

The  unintentional  ion  channeling  in  low  energy  ion  implantation  of  )oron 
into  (100)  silicon  results  in  much  deeper  junctions  than  predicted  by  LSS  tieory 
even  for  wafers  tilted  8  degrees  from  the  normal.  This  partial  channeling  affect 
imposes  a  limitation  for  the  achievable  p-type  shallow  junction  depth.  Bo  h  B+ 
and  BFp  implants  show  the  effect.  The  characteristics  of  this  partial  channel¬ 
ing  profile  are  under  investigation.  An  empirical  formula  was  found  to  describe 
the  enhancement  of  the  junction  depth.  Experiments  also  showed  that  the  deep 
penetration  tail  could  be  suppressed  by  a  silicon  pre-implant  to  drive  the  silicon 
surface  Layer  amorphous. 


[l]  T.  M.  Liu  and  W.  G.  Oldham,  "Channeling  Effect  of  Low  Energy  Boron  Implant 
in  (100)  Silicon",  to  be  published  in  Electron  Device  Letters,  March  1983. 


10.8.  AI/SI  Contact  Hectromigration 

An  important  issue  in  V1£I  technology  development  is  to  achieve  reliable 
shallow  junctions  where  Ai/Si  contact  spiking  is  a  major  concern.  In  NU0S  tech¬ 
nology,  this  concern  can  often  be  relieved  by  introducing  phosphorous  plug  into 
n+  contact.  However,  in  CMOS  technolgoy,  the  introduction  of  plug  would 
increase  the  complexity  of  the  already  complicated  process.  A  study  of  Al/n-f 
and  Al/p+  contact  electromigration  would  be  necessary  to  find  out  the  optimum 
CMOS  device  structure  with  shallowest  possible  junction. 

A  automatic  measurement  setup  based  on  IBM  Personal  Computer  is 
configured.  The  materials  of  test  packages  are  also  chosen  to  operate  at  tem¬ 
perature  about  250C  without  failure. 


